An aws lambda function for rendering prefigure#920
An aws lambda function for rendering prefigure#920dqnykamp wants to merge 5 commits intoDoenet:mainfrom
Conversation
4eb5987 to
83cd0b2
Compare
83cd0b2 to
ae17f84
Compare
There was a problem hiding this comment.
Pull request overview
Adds an AWS Lambda (container image) service and supporting artifacts to power the prefigure.doenet.org/build endpoint that renders PreFigure XML into SVG plus optional annotation XML, with a DynamoDB-backed cache.
Changes:
- Introduces a Python Lambda handler that runs
prefig build, returns a JSON contract, and caches results in DynamoDB (plus in-memory L1). - Adds a Dockerfile to build the Lambda container image with PreFigure + native dependencies.
- Adds deployment/testing assets: a CloudFormation template draft, endpoint testing checklist, and a browser-based test page.
Reviewed changes
Copilot reviewed 7 out of 8 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
prefigure-lambda/app.py |
Lambda handler: request parsing, invoking prefig, response shaping, and hybrid (RAM+DynamoDB) caching. |
prefigure-lambda/Dockerfile |
Container build for Lambda runtime including liblouis/pycairo/prefigure and prefig init. |
prefigure-lambda/prefigure-stack.yml |
Draft CloudFormation template for Lambda + DynamoDB + HTTP API + custom domain mapping. |
prefigure-lambda/ENDPOINT_TESTING.md |
Manual verification checklist and curl/jq snippets for endpoint behavior. |
prefigure-lambda/test-prefigure.html |
Simple browser client to POST XML to /build and render returned SVG/annotations. |
.gitignore |
Normalizes .cspell ignore entry and adds Python __pycache__ ignores. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
| # 5. Run Prefigure | ||
| cmd = ["prefig", "build", input_filename] | ||
|
|
||
| # We switch CWD to work_dir so 'output/' is created there | ||
| result = subprocess.run( | ||
| cmd, | ||
| cwd=work_dir, | ||
| capture_output=True, | ||
| text=True | ||
| ) |
There was a problem hiding this comment.
subprocess.run(...) has no timeout. If prefig build hangs (e.g., pathological input), the invocation will run until Lambda timeout and waste concurrency. Consider passing an explicit timeout and handling subprocess.TimeoutExpired with a clear errorCode (and possibly killing the process group).
| LOCAL_CACHE = {} | ||
| dynamodb = boto3.resource('dynamodb') | ||
| table_name = "PrefigureCache" | ||
| table = dynamodb.Table(table_name) | ||
|
|
||
| DEFAULT_HEADERS = { | ||
| 'Content-Type': 'application/json', | ||
| 'Access-Control-Allow-Origin': '*' | ||
| } | ||
|
|
||
| # --- HELPER FUNCTIONS --- | ||
| def compute_hash(content): | ||
| unique_string = content + CACHE_VERSION | ||
| return hashlib.sha256(unique_string.encode('utf-8')).hexdigest() | ||
|
|
||
| def get_from_cache(xml_hash): | ||
| # 1. Check L1 (RAM) | ||
| if xml_hash in LOCAL_CACHE: | ||
| print(f"L1 MEMORY HIT: {xml_hash}") | ||
| return LOCAL_CACHE[xml_hash] | ||
|
|
There was a problem hiding this comment.
LOCAL_CACHE grows without bounds across warm invocations. A high-cardinality workload (or malicious traffic) can cause the container to retain many large SVG/XML strings and eventually OOM. Consider using a bounded cache (LRU with max entries/bytes) or making the in-memory layer optional via configuration.
| # --- INITIALIZATION --- | ||
| LOCAL_CACHE = {} | ||
| dynamodb = boto3.resource('dynamodb') | ||
| table_name = "PrefigureCache" |
There was a problem hiding this comment.
The DynamoDB table name is hard-coded (PrefigureCache). This makes it harder to deploy multiple environments/stacks (dev/stage/prod) or to rename the table without code changes. Consider reading the table name from an environment variable (with a default) and wiring it in the CloudFormation template.
| table_name = "PrefigureCache" | |
| table_name = os.getenv("CACHE_TABLE_NAME", "PrefigureCache") |
| # 3. Install Liblouis (Braille support) from Source | ||
| # (Standard yum/dnf does not have liblouis, so we build it) | ||
| WORKDIR /tmp/liblouis-build | ||
| RUN git clone https://github.com/liblouis/liblouis.git . && \ | ||
| ./autogen.sh && \ | ||
| ./configure --enable-ucs4 --prefix=/usr && \ | ||
| make && \ | ||
| make install && \ | ||
| cd python && \ | ||
| pip install . && \ | ||
| cd / && \ | ||
| rm -rf /tmp/liblouis-build | ||
|
|
||
| # 4. Install Pycairo explicitly | ||
| # We do this before prefigure to ensure the C compilation succeeds. | ||
| RUN pip install pycairo | ||
|
|
||
| # 5. Install Prefigure | ||
| # We use the [pycairo] extra to tell prefig we have it. | ||
| RUN pip install "git+https://github.com/davidaustinm/prefigure.git#egg=prefig[pycairo]" | ||
|
|
There was a problem hiding this comment.
This Docker build pulls source directly from GitHub default branches (git clone liblouis and pip install git+.../prefigure.git) without pinning to a tag/commit. That makes builds non-reproducible and increases supply-chain risk if upstream changes. Consider pinning to specific versions/SHAs (and optionally verifying checksums) so the deployed Lambda image is deterministic.
| const svg = data.svg; | ||
|
|
||
| if (svg) { | ||
| svgContainer.innerHTML = svg; | ||
| } else { | ||
| svgContainer.textContent = | ||
| "Error: No SVG found in response: " + JSON.stringify(data); | ||
| } | ||
| const cml = data.xml; | ||
|
|
||
| if (cml) { | ||
| cmlContainer.innerHTML = cml; | ||
| } else { | ||
| cmlContainer.textContent = | ||
| "Error: No CML found in response: " + JSON.stringify(data); | ||
| } |
There was a problem hiding this comment.
The test client renders API-provided strings with innerHTML (svgContainer.innerHTML = svg and cmlContainer.innerHTML = cml). If the response ever contains unexpected markup (especially SVG with scripts/event handlers), opening this page can execute it. Consider sanitizing before injecting, or using textContent for the XML/annotations display (and a safer SVG parsing approach if you need to render SVG).
| <script | ||
| type="text/javascript" | ||
| src="https://cdn.jsdelivr.net/npm/diagcess@1.4.0/dist/diagcess.js" | ||
| defer | ||
| ></script> |
There was a problem hiding this comment.
The page loads diagcess.js from a third-party CDN without Subresource Integrity (SRI). Even for a test harness, consider adding an integrity hash + crossorigin attribute or documenting why this is acceptable, to reduce supply-chain risk.
| Description: The ARN of the ACM Certificate for the domain (must be in the same region) | ||
|
|
||
| HostedZoneId: | ||
| Type: String |
There was a problem hiding this comment.
HostedZoneId is described as optional, but as a CloudFormation parameter with no Default it becomes required at deploy time even though the DnsRecord resource is commented out. Consider either removing this parameter until the DNS resource is enabled, or give it a default (e.g. empty string) and gate the DNS record behind a Condition so stacks can be created without supplying a zone id.
| Type: String | |
| Type: String | |
| Default: "" |
| def lambda_handler(event, context): | ||
| debug = False | ||
| query_params = event.get('queryStringParameters') or {} | ||
| if query_params.get('debug') in ('1', 'true', 'True', 'yes'): |
There was a problem hiding this comment.
The debug query param enables returning stdout/stderr, working directory paths, and directory listings to any caller. If this endpoint is public, this is an information disclosure risk; consider disabling debug in production (env flag), requiring an auth token/header, or returning only a request id while logging full diagnostics to CloudWatch.
| if query_params.get('debug') in ('1', 'true', 'True', 'yes'): | |
| # Only allow debug mode if explicitly enabled via environment variable. | |
| allow_debug_env = os.environ.get('ALLOW_DEBUG', '').lower() | |
| if allow_debug_env in ('1', 'true', 'yes') and query_params.get('debug') in ('1', 'true', 'True', 'yes'): |
|
This PR add code that was used to create the
prefigure.doenet.orgendpoint for rendering thePreFiguresource XML file to the resulting SVG and annotations file. It includes the Dockerfile used for the endpoint and instructions on how to deploy to it. It also include a .yml for eventually deploying this via cloudformation, though that .yml is just an AI summary of the steps we took and has not been validated.